NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

https://doi.org/10.1287/opre.2021.2237

Hu, Yichun; Kallus, Nathan; Mao, Xiaojie (January 2022, Operations Research)

We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (β at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite [Formula: see text]), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings, and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.
more » « less
Full Text Available
Fast Rates for Contextual Linear Optimization

https://doi.org/10.1287/mnsc.2022.4383

Hu, Yichun; Kallus, Nathan; Mao, Xiaojie (January 2022, Management Science)

Incorporating side observations in decision making can reduce uncertainty and boost performance, but it also requires that we tackle a potentially complex predictive relationship. Although one may use off-the-shelf machine learning methods to separately learn a predictive model and plug it in, a variety of recent methods instead integrate estimation and optimization by fitting the model to directly optimize downstream decision performance. Surprisingly, in the case of contextual linear optimization, we show that the naïve plug-in approach actually achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance. We show this by leveraging the fact that specific problem instances do not have arbitrarily bad near-dual-degeneracy. Although there are other pros and cons to consider as we discuss and illustrate numerically, our results highlight a nuanced landscape for the enterprise to integrate estimation and optimization. Our results are overall positive for practice: predictive models are easy and fast to train using existing tools; simple to interpret; and, as we show, lead to decisions that perform very well. This paper was accepted by Hamid Nazerzadeh, data science.
more » « less
Full Text Available
Fast Rates for the Regret of Offline Reinforcement Learning

Hu, Yichun; Kallus, Nathan; Uehara, Masatoshi (January 2021, Proceedings of the 34th Conference on Learning Theory)
null (Ed.)
Full Text Available
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

Hu, Yichun; Kallus, Nathan; Mao, Xiaojie (January 2020, Conference on Learning Theory)

Full Text Available

Search for: All records